Actor-Critic Algorithm with Transition Cost Estimation

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Generalized Natural Actor-Critic Algorithm

Policy gradient Reinforcement Learning (RL) algorithms have received substantial attention, seeking stochastic policies that maximize the average (or discounted cumulative) reward. In addition, extensions based on the concept of the Natural Gradient (NG) show promising learning efficiency because these regard metrics for the task. Though there are two candidate metrics, Kakade’s Fisher Informat...

متن کامل

An Actor-Critic Algorithm for Sequence Prediction

We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL). Current log-likelihood training methods are limited by the discrepancy between their training and testing modes, as models must generate tokens conditioned on their previous guesses rather than the ground-truth tokens. We address this problem by introducing a cri...

متن کامل

An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes

We develop in this article the first actor–critic reinforcement learning algorithm with function approximation for a problem of control under multiple inequality constraints. We consider the infinite horizon discounted cost framework in which both the objective and the constraint functions are suitable expected policy-dependent discounted sums of certain sample path functions. We apply the Lagr...

متن کامل

Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning

To improve the convergence rate and the sample efficiency, two efficient learning methods AC-HMLP and RAC-HMLP (AC-HMLP with ℓ2-regularization) are proposed by combining actor-critic algorithm with hierarchical model learning and planning. The hierarchical models consisting of the local and the global models, which are learned at the same time during learning of the value function and the polic...

متن کامل

Hierarchical Actor-Critic

The ability to learn at different resolutions in time may help overcome one of the main challenges in deep reinforcement learning — sample efficiency. Hierarchical agents that operate at different levels of temporal abstraction can learn tasks more quickly because they can divide the work of learning behaviors among multiple policies and can also explore the environment at a higher level. In th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: The International Journal of Fuzzy Logic and Intelligent Systems

سال: 2016

ISSN: 1598-2645,2093-744X

DOI: 10.5391/ijfis.2016.16.4.270